AITopics | ml-superb 2

Collaborating Authors

ml-superb 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties

Chen, William, Meng, Chutong, Shi, Jiatong, Bartelds, Martijn, Wang, Shih-Heng, Wang, Hsiu-Hsuan, Mosquera, Rafael, Hincapie, Sara, Jurafsky, Dan, Anastasopoulos, Antonis, Lee, Hung-yi, Livescu, Karen, Watanabe, Shinji

arXiv.org Artificial IntelligenceSep-10-2025

Recent improvements in multilingual ASR have not been equally distributed across languages and language varieties. To advance state-of-the-art (SOT A) ASR models, we present the Interspeech 2025 ML-SUPERB 2.0 Challenge. We construct a new test suite that consists of data from 200+ languages, accents, and dialects to evaluate SOT A multilingual speech models. The challenge also introduces an online evaluation server based on DynaBench, allowing for flexibility in model design and architecture for participants. The challenge received 5 submissions from 3 teams, all of which outperformed our baselines. The best-performing submission achieved an absolute improvement in LID accuracy of 23% and a reduction in CER of 18% when compared to the best baseline on a general multilingual test set. On accented and dialectal data, the best submission obtained 30.2% lower CER and 15.7% higher LID accuracy, showing the importance of community challenges in making speech technologies more inclusive.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.07139

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.98)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Geolocation-Aware Robust Spoken Language Identification

Wang, Qingzheng, Shim, Hye-jin, Sun, Jiancheng, Watanabe, Shinji

arXiv.org Artificial IntelligenceAug-26-2025

--While Self-supervised Learning (SSL) has significantly improved Spoken Language Identification (LID), existing models often struggle to consistently classify dialects and accents of the same language as a unified class. T o address this challenge, we propose geolocation-aware LID, a novel approach that incorporates language-level geolocation information into the SSL-based LID model. Specifically, we introduce geolocation prediction as an auxiliary task and inject the predicted vectors into intermediate representations as conditioning signals. This explicit conditioning encourages the model to learn more unified representations for dialectal and accented variations. Experiments across six multilingual datasets demonstrate that our approach improves robustness to intra-language variations and unseen domains, achieving new state-of-the-art accuracy on FLEURS (97.7%) and 9.7% relative improvement on ML-SUPERB 2.0 dialect set.

artificial intelligence, machine learning, ml-superb 2, (16 more...)

arXiv.org Artificial Intelligence

2508.17148

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)

Add feedback

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Wang, Qingzheng, Sun, Jiancheng, Peng, Yifan, Watanabe, Shinji

arXiv.org Artificial IntelligenceJun-4-2025

Multilingual speech processing with self-supervised or supervised pre-trained Speech Foundation Models (SFM) has achieved strong performance on tasks like Language Identification (LID) and Automatic Speech Recognition (ASR). However, these models struggle with limited resources during fine-tuning. This paper enhances multilingual LID and ASR on ML-SUPERB 2.0 by exploring multiple strategies for adapting SFMs, including frozen upstream training, partial fine-tuning, and low-rank adaptation. Furthermore, we employ data augmentation to mitigate performance gaps in few-shot settings and introduce LID Connectionist Temporal Classification (CTC) loss for regularization. Our approach achieves a 14% relative improvement in LID accuracy and a 30% relative reduction in ASR CER over the baseline on ML-SUPERB 2.0, securing second place in the Interspeech 2025 ML-SUPERB 2.0 Challenge.

artificial intelligence, machine learning, ml-superb 2, (15 more...)

arXiv.org Artificial Intelligence

2505.242

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge

Alumäe, Tanel, Fedorchenko, Artem

arXiv.org Artificial IntelligenceJun-3-2025

This paper describes the language identification and multilingual speech recognition system developed at Tallinn University of Technology for the Interspeech 2025 ML-SUPERB 2.0 Challenge. A hybrid language identification system is used, consisting of a pretrained language embedding model and a light-weight speech recognition model with a shared encoder across languages and language-specific bigram language models. For speech recognition, three models are used, where only a single model is applied for each language, depending on the training data availability and performance on held-out data. The model set consists of a finetuned version of SeamlessM4T, MMS-1B-all with custom language adapters and MMS-zeroshot. The system obtained the top overall score in the challenge.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.01458

Country: Europe > Estonia > Harju County > Tallinn (0.25)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition

Bartelds, Martijn, Nandi, Ananjan, Doumbouya, Moussa Koulako Bala, Jurafsky, Dan, Hashimoto, Tatsunori, Livescu, Karen

arXiv.org Artificial IntelligenceFeb-3-2025

Modern deep learning models often achieve high overall performance, but consistently fail on specific subgroups. Group distributionally robust optimization (group DRO) addresses this problem by minimizing the worst-group loss, but it fails when group losses misrepresent performance differences between groups. This is common in domains like speech, where the widely used connectionist temporal classification (CTC) loss scales with input length and varies with linguistic and acoustic properties, leading to spurious differences between group losses. We present CTC-DRO, which addresses the shortcomings of the group DRO objective by smoothing the group weight update to prevent overemphasis on consistently high-loss groups, while using input length-matched batching to mitigate CTC's scaling issues. We evaluate CTC-DRO on the task of multilingual automatic speech recognition (ASR) across five language sets from the ML-SUPERB 2.0 benchmark. CTC-DRO consistently outperforms group DRO and CTC-based baseline models, reducing the worst-language error by up to 65.9% and the average error by up to 47.7%. CTC-DRO can be applied to ASR with minimal computational costs, and offers the potential for reducing group disparities in other domains with similar challenges.

artificial intelligence, ctc-dro, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.01777

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Shi, Jiatong, Wang, Shih-Heng, Chen, William, Bartelds, Martijn, Kumar, Vanya Bannihatti, Tian, Jinchuan, Chang, Xuankai, Jurafsky, Dan, Livescu, Karen, Lee, Hung-yi, Watanabe, Shinji

arXiv.org Artificial IntelligenceJun-12-2024

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.

fine-tuning, ml-superb 2, proc, (15 more...)

arXiv.org Artificial Intelligence

2406.08641

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Mexico > Puebla (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback